Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Accurate self-correction of errors in long reads using de Bruijn graphs

Identifieur interne : 000073 ( France/Analysis ); précédent : 000072; suivant : 000074

Accurate self-correction of errors in long reads using de Bruijn graphs

Auteurs : Leena Salmela [Finlande] ; Riku Walve [Finlande] ; Eric Rivals [France] ; Esko Ukkonen [Finlande]

Source :

RBID : PMC:5351550

Descripteurs français

English descriptors

Abstract

AbstractMotivation

New long read sequencing technologies, like PacBio SMRT and Oxford NanoPore, can produce sequencing reads up to 50 000 bp long but with an error rate of at least 15%. Reducing the error rate is necessary for subsequent utilization of the reads in, e.g. de novo genome assembly. The error correction problem has been tackled either by aligning the long reads against each other or by a hybrid approach that uses the more accurate short reads produced by second generation sequencing technologies to correct the long reads.

Results

We present an error correction method that uses long reads only. The method consists of two phases: first, we use an iterative alignment-free correction method based on de Bruijn graphs with increasing length of k-mers, and second, the corrected reads are further polished using long-distance dependencies that are found using multiple alignments. According to our experiments, the proposed method is the most accurate one relying on long reads only for read sets with high coverage. Furthermore, when the coverage of the read set is at least 75×, the throughput of the new method is at least 20% higher.

Availability and Implementation

LoRMA is freely available at http://www.cs.helsinki.fi/u/lmsalmel/LoRMA/.


Url:
DOI: 10.1093/bioinformatics/btw321
PubMed: 27273673
PubMed Central: 5351550


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:5351550

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Accurate self-correction of errors in long reads using de Bruijn graphs</title>
<author>
<name sortKey="Salmela, Leena" sort="Salmela, Leena" uniqKey="Salmela L" first="Leena" last="Salmela">Leena Salmela</name>
<affiliation wicri:level="4">
<nlm:aff id="btw321-aff1">Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland</nlm:aff>
<country xml:lang="fr">Finlande</country>
<wicri:regionArea>Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki</wicri:regionArea>
<orgName type="university">Université d'Helsinki</orgName>
<placeName>
<settlement type="city">Helsinki</settlement>
<region type="région" nuts="2">Uusimaa</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Walve, Riku" sort="Walve, Riku" uniqKey="Walve R" first="Riku" last="Walve">Riku Walve</name>
<affiliation wicri:level="4">
<nlm:aff id="btw321-aff1">Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland</nlm:aff>
<country xml:lang="fr">Finlande</country>
<wicri:regionArea>Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki</wicri:regionArea>
<orgName type="university">Université d'Helsinki</orgName>
<placeName>
<settlement type="city">Helsinki</settlement>
<region type="région" nuts="2">Uusimaa</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Rivals, Eric" sort="Rivals, Eric" uniqKey="Rivals E" first="Eric" last="Rivals">Eric Rivals</name>
<affiliation wicri:level="3">
<nlm:aff id="btw321-aff2">LIRMM and Institut de Biologie Computationelle, CNRS and Université Montpellier, Montpellier, France</nlm:aff>
<country xml:lang="fr">France</country>
<wicri:regionArea>LIRMM and Institut de Biologie Computationelle, CNRS and Université Montpellier, Montpellier</wicri:regionArea>
<placeName>
<region type="region">Occitanie (région administrative)</region>
<region type="old region">Languedoc-Roussillon</region>
<settlement type="city">Montpellier</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Ukkonen, Esko" sort="Ukkonen, Esko" uniqKey="Ukkonen E" first="Esko" last="Ukkonen">Esko Ukkonen</name>
<affiliation wicri:level="4">
<nlm:aff id="btw321-aff1">Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland</nlm:aff>
<country xml:lang="fr">Finlande</country>
<wicri:regionArea>Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki</wicri:regionArea>
<orgName type="university">Université d'Helsinki</orgName>
<placeName>
<settlement type="city">Helsinki</settlement>
<region type="région" nuts="2">Uusimaa</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">27273673</idno>
<idno type="pmc">5351550</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5351550</idno>
<idno type="RBID">PMC:5351550</idno>
<idno type="doi">10.1093/bioinformatics/btw321</idno>
<date when="2016">2016</date>
<idno type="wicri:Area/Pmc/Corpus">000B15</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000B15</idno>
<idno type="wicri:Area/Pmc/Curation">000B15</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000B15</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000C58</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">000C58</idno>
<idno type="wicri:source">PubMed</idno>
<idno type="RBID">pubmed:27273673</idno>
<idno type="wicri:Area/PubMed/Corpus">001091</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001091</idno>
<idno type="wicri:Area/PubMed/Curation">001091</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">001091</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000E00</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000E00</idno>
<idno type="wicri:Area/Ncbi/Merge">001653</idno>
<idno type="wicri:Area/Ncbi/Curation">001653</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">001653</idno>
<idno type="wicri:doubleKey">1367-4803:2016:Salmela L:accurate:self:correction</idno>
<idno type="wicri:Area/Main/Merge">001394</idno>
<idno type="wicri:Area/Main/Curation">001389</idno>
<idno type="wicri:Area/Main/Exploration">001389</idno>
<idno type="wicri:Area/France/Extraction">000073</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Accurate self-correction of errors in long reads using de Bruijn graphs</title>
<author>
<name sortKey="Salmela, Leena" sort="Salmela, Leena" uniqKey="Salmela L" first="Leena" last="Salmela">Leena Salmela</name>
<affiliation wicri:level="4">
<nlm:aff id="btw321-aff1">Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland</nlm:aff>
<country xml:lang="fr">Finlande</country>
<wicri:regionArea>Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki</wicri:regionArea>
<orgName type="university">Université d'Helsinki</orgName>
<placeName>
<settlement type="city">Helsinki</settlement>
<region type="région" nuts="2">Uusimaa</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Walve, Riku" sort="Walve, Riku" uniqKey="Walve R" first="Riku" last="Walve">Riku Walve</name>
<affiliation wicri:level="4">
<nlm:aff id="btw321-aff1">Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland</nlm:aff>
<country xml:lang="fr">Finlande</country>
<wicri:regionArea>Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki</wicri:regionArea>
<orgName type="university">Université d'Helsinki</orgName>
<placeName>
<settlement type="city">Helsinki</settlement>
<region type="région" nuts="2">Uusimaa</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Rivals, Eric" sort="Rivals, Eric" uniqKey="Rivals E" first="Eric" last="Rivals">Eric Rivals</name>
<affiliation wicri:level="3">
<nlm:aff id="btw321-aff2">LIRMM and Institut de Biologie Computationelle, CNRS and Université Montpellier, Montpellier, France</nlm:aff>
<country xml:lang="fr">France</country>
<wicri:regionArea>LIRMM and Institut de Biologie Computationelle, CNRS and Université Montpellier, Montpellier</wicri:regionArea>
<placeName>
<region type="region">Occitanie (région administrative)</region>
<region type="old region">Languedoc-Roussillon</region>
<settlement type="city">Montpellier</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Ukkonen, Esko" sort="Ukkonen, Esko" uniqKey="Ukkonen E" first="Esko" last="Ukkonen">Esko Ukkonen</name>
<affiliation wicri:level="4">
<nlm:aff id="btw321-aff1">Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland</nlm:aff>
<country xml:lang="fr">Finlande</country>
<wicri:regionArea>Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki</wicri:regionArea>
<orgName type="university">Université d'Helsinki</orgName>
<placeName>
<settlement type="city">Helsinki</settlement>
<region type="région" nuts="2">Uusimaa</region>
</placeName>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Bioinformatics</title>
<idno type="ISSN">1367-4803</idno>
<idno type="eISSN">1367-4811</idno>
<imprint>
<date when="2016">2016</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Escherichia coli (genetics)</term>
<term>Genome</term>
<term>High-Throughput Nucleotide Sequencing (methods)</term>
<term>Saccharomyces cerevisiae (genetics)</term>
<term>Sequence Analysis, DNA (methods)</term>
<term>Software</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr">
<term>Algorithmes</term>
<term>Analyse de séquence d'ADN ()</term>
<term>Escherichia coli (génétique)</term>
<term>Génome</term>
<term>Logiciel</term>
<term>Saccharomyces cerevisiae (génétique)</term>
<term>Séquençage nucléotidique à haut débit ()</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en">
<term>Escherichia coli</term>
<term>Saccharomyces cerevisiae</term>
</keywords>
<keywords scheme="MESH" qualifier="génétique" xml:lang="fr">
<term>Escherichia coli</term>
<term>Saccharomyces cerevisiae</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>High-Throughput Nucleotide Sequencing</term>
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Genome</term>
<term>Software</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr">
<term>Algorithmes</term>
<term>Analyse de séquence d'ADN</term>
<term>Génome</term>
<term>Logiciel</term>
<term>Séquençage nucléotidique à haut débit</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<title>Abstract</title>
<sec id="SA1">
<title>Motivation</title>
<p>New long read sequencing technologies, like PacBio SMRT and Oxford NanoPore, can produce sequencing reads up to 50 000 bp long but with an error rate of at least 15%. Reducing the error rate is necessary for subsequent utilization of the reads in, e.g.
<italic>de novo</italic>
genome assembly. The error correction problem has been tackled either by aligning the long reads against each other or by a hybrid approach that uses the more accurate short reads produced by second generation sequencing technologies to correct the long reads.</p>
</sec>
<sec id="SA2">
<title>Results</title>
<p>We present an error correction method that uses long reads only. The method consists of two phases: first, we use an iterative alignment-free correction method based on de Bruijn graphs with increasing length of
<italic>k</italic>
-mers, and second, the corrected reads are further polished using long-distance dependencies that are found using multiple alignments. According to our experiments, the proposed method is the most accurate one relying on long reads only for read sets with high coverage. Furthermore, when the coverage of the read set is at least 75×, the throughput of the new method is at least 20% higher.</p>
</sec>
<sec id="SA3">
<title>Availability and Implementation</title>
<p>LoRMA is freely available at
<ext-link ext-link-type="uri" xlink:href="http://www.cs.helsinki.fi/u/lmsalmel/LoRMA/">http://www.cs.helsinki.fi/u/lmsalmel/LoRMA/</ext-link>
.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Au, K F" uniqKey="Au K">K.F. Au</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bankevich, A" uniqKey="Bankevich A">A. Bankevich</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Berlin, K" uniqKey="Berlin K">K. Berlin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Boucher, C" uniqKey="Boucher C">C. Boucher</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cazaux, B" uniqKey="Cazaux B">B. Cazaux</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chaisson, M J" uniqKey="Chaisson M">M.J. Chaisson</name>
</author>
<author>
<name sortKey="Tesler, G" uniqKey="Tesler G">G. Tesler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chin, C S" uniqKey="Chin C">C.S. Chin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Drezen, E" uniqKey="Drezen E">E. Drezen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hackl, T" uniqKey="Hackl T">T. Hackl</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Koren, S" uniqKey="Koren S">S. Koren</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Koren, S" uniqKey="Koren S">S. Koren</name>
</author>
<author>
<name sortKey="Philippy, A M" uniqKey="Philippy A">A.M. Philippy</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Laehnemann, D" uniqKey="Laehnemann D">D. Laehnemann</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Laver, T" uniqKey="Laver T">T. Laver</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lee, C" uniqKey="Lee C">C. Lee</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Madoui, M A" uniqKey="Madoui M">M.A. Madoui</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Miclotte, G" uniqKey="Miclotte G">G. Miclotte</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nakamura, K" uniqKey="Nakamura K">K. Nakamura</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ono, Y" uniqKey="Ono Y">Y. Ono</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Peng, Y" uniqKey="Peng Y">Y. Peng</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Salmela, L" uniqKey="Salmela L">L. Salmela</name>
</author>
<author>
<name sortKey="Rivals, E" uniqKey="Rivals E">E. Rivals</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Salmela, L" uniqKey="Salmela L">L. Salmela</name>
</author>
<author>
<name sortKey="Schroder, J" uniqKey="Schroder J">J. Schröder</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schirmer, M" uniqKey="Schirmer M">M. Schirmer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yang, X" uniqKey="Yang X">X. Yang</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<affiliations>
<list>
<country>
<li>Finlande</li>
<li>France</li>
</country>
<region>
<li>Languedoc-Roussillon</li>
<li>Occitanie (région administrative)</li>
<li>Uusimaa</li>
</region>
<settlement>
<li>Helsinki</li>
<li>Montpellier</li>
</settlement>
<orgName>
<li>Université d'Helsinki</li>
</orgName>
</list>
<tree>
<country name="Finlande">
<region name="Uusimaa">
<name sortKey="Salmela, Leena" sort="Salmela, Leena" uniqKey="Salmela L" first="Leena" last="Salmela">Leena Salmela</name>
</region>
<name sortKey="Ukkonen, Esko" sort="Ukkonen, Esko" uniqKey="Ukkonen E" first="Esko" last="Ukkonen">Esko Ukkonen</name>
<name sortKey="Walve, Riku" sort="Walve, Riku" uniqKey="Walve R" first="Riku" last="Walve">Riku Walve</name>
</country>
<country name="France">
<region name="Occitanie (région administrative)">
<name sortKey="Rivals, Eric" sort="Rivals, Eric" uniqKey="Rivals E" first="Eric" last="Rivals">Eric Rivals</name>
</region>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/France/Analysis
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000073 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/France/Analysis/biblio.hfd -nk 000073 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    France
   |étape=   Analysis
   |type=    RBID
   |clé=     PMC:5351550
   |texte=   Accurate self-correction of errors in long reads using de Bruijn graphs
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/France/Analysis/RBID.i   -Sk "pubmed:27273673" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/France/Analysis/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021